Correct parts extraction from speech recognition results using semantic distance calculation, and its application to speech translation

ثبت نشده
چکیده

This paper proposes a method for extracting the correct parts from speech recognition results by using an example-based approach for parsing those results that include several recognition errors. Correct parts are extracted using two factors: (1) the semantic distance between the input expression and example expression, and (2) the structure selected by the shortest semantic distance. We examined the correct parts extraction rate and the effectiveness of the method in improving the speech understanding rate and the speech translation rate. The examination results showed that the proposed method is able to efficiently extract the correct parts from speech recognition results. About ninety-six percent of the extracted parts are correct. The results also showed that the proposed method is effective in understanding misrecognition speech sentences and in improving speech translation results. The misunderstanding rate for erroneous sentences is reduced about haiti Sixty-nine percent of speech translation results are improved for misrecognized sentences. 1 I n t r o d u c t i o n In continuous speech recognition, N-grams have been widely used as effective linguistic constraints for spontaneous speech [1]. To reduce the search effort, N of a high-order can be quite powerful; but making the large corpus necessary to calculate a reliable high-order N is unrealistic. For a realistic linguistic constraint, almost all speech recognition systems use a low-order N-gram, like a bi-gram or tri-gram, which can be constrainted only to the local parts. However this is one of the reasons why many misrecognized sentences using N-grams are strange on long parts spanning over N words. During *Now working at Toyo Information Systems Co., Ltd the recognition process, several candidates have to be pruned if the beam width is too small, and the pruning cannot but use only those local parts already recognized. Even if we could get a large enough corpus to train a high-order N-gram, it would be impossible to determine the best recognition candidate in consideration of the whole sentence. To put a speech dialogue system or a speech translation system into practical use, it is necessary to develop a mechanism that can parse the misrecognized results using global linguistic constraints. Several methods have already been proposed to parse ill-formed sentences or phrases using global linguistic constraints based on a contextfree-grammar (CFG) framework, and their effectiveness against some misrecognized speech sentences have been confirmed [2, 3]. Also these parsings are used for translation ( see for example the use of the GLR parser in Janus[4] ). In these studies, even if the parsing was unsuccessful for erroneous parts, the parsing could be continued by deleting or recovering the erroneous parts. The parsing was done on the assumption that every input sentence is well-formed after all erroneous parts are recovered. In reality, however spontaneous speech contains a lot of ill-formed sentences and it is difficult to analyze every spontaneous sentence by the CFG framework. Concerning the CFG framework, syntactic rules written by subtrees are proposed [5]. Even if a whole sentence can not be analyzed by CFG, the sentence can be expressed by combining several subtrees. The subtrees are effective in parsing spontaneous speech parts. Still, because the subtrees can deal only with local parts like in Ngram modeling basically, parsing is not sufficient for parsing misrecognized sentences. Furthermore, the subtrees are not sufficient in extracting suitable meaningful candidate structures, because that these linguistic constraints are based on the grammatical constraint without semantics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correct Parts Extraction From Speech Recognition Results Using Semantic Distance Calculation, And Its Application To Speech Translation

This paper proposes a method for extracting the correct parts from speech recognition results by using an example-based approach for parsing those results that include several recognition errors. Correct parts are extracted using two factors: (1) the semantic distance between the input expression and example expression, and (2) the structure selected by the shortest semantic distance. We examin...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...

متن کامل

Multi-lingual Spoken Dialog Translation System Using Transfer-driven Machine Translation

This paper describes a Transfer-Driven Machine Translation (TDMT) system as a prototype for efficient multi-lingual spoken-dialog translation. Currently, the TDMT system deals with dialogues in the travel domain, such as travel scheduling, hotel reservation, and trouble-shooting, and covers almost all expressions presented in commercially-available travel conversation guides. In addition, to pu...

متن کامل

Design and Implementation of an Intelligent Part of Speech Generator

The aim of this paper is to report on an attempt to design and implement an intelligent system capable of generating the correct part of speech for a given sentence while the sentence is totally new to the system and not stored in any database available to the system. It follows the same steps a normal individual does to provide the correct parts of speech using a natural language processor. It...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002